SLC Crime Statistics 2016

Date: 12/05/2018
App: https://srinivasanswaminathan.shinyapps.io/SLCPD/

Introduction

An ideal situation for any law enforcement agency is to prevent a crime from happening in the first place. In the unfortunate event of a crime scenario, the agency’s immediate objective is to intervene as soon as possible to diffuse the situation. A critical component of achieving these objectives is the presence of police in crime hotspots. It is common knowledge that visible presence of police deters crime from happening as well as improves response time.

Objective

The app can be used by civilians to view the crime hotspots on a map of Salt Lake City. This will be useful for tourists or new residents of the city to be aware of neighborhoods with a history of crime. These can be important factors in determining an apartment to rent or house to buy or places to hangout during a visit

In addition, law enforcement agencies can use this app for the following purposes:
* Deploy police in crime hotspots to both deter crime as well as improve response times
* Split the police grids into various clusters based on the nature of crime.
* Recruiment, training, personnel transfer and exchange of best practices can be tailored based on cluster groups
* City councils can implement anti-crime programs as well as perform analysis for policy research

Load Packages

library(shiny)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.6
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)
library(skimr)
library(DT)
## 
## Attaching package: 'DT'
## The following objects are masked from 'package:shiny':
## 
##     dataTableOutput, renderDataTable
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(leaflet)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(sp)
library(rgdal)
## rgdal: version: 1.3-6, (SVN revision 773)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
##  Path to GDAL shared files: C:/Users/swati/Documents/R/win-library/3.5/rgdal/gdal
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: C:/Users/swati/Documents/R/win-library/3.5/rgdal/proj
##  Linking to sp version: 1.3-1
library(NbClust)
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(factoextra)
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
library(formattable)
## 
## Attaching package: 'formattable'
## The following object is masked from 'package:plotly':
## 
##     style

Load crime data.

Source:https://opendata.utah.gov/Public-Safety/SALT-LAKE-CITY-POLICE-CASES-2016/trgz-4r9d

We used publicly available crime data from SLCPD for the year 2016. While most of the variables are self explanatory, some of the other variables are:
Case: Unique case ID
NCIC Code: National Crime Information Center Code
IBR:Incidnt Based Reporting
X.Coordinate:State Plane NAD83 X Coordinate
Y_Coordinate:State Plane NAD83 X Coordinate

options(stringsAsFactors = FALSE)
crime_data<-read.csv("C:\\Users\\swati\\Google Drive\\Analytical Applications\\project2\\SALT_LAKE_CITY_-_POLICE_CASES_-2016.csv",encoding ='ASCII')
names(crime_data)[1]<-"CASE"
str(crime_data)
## 'data.frame':    61240 obs. of  14 variables:
##  $ CASE        : chr  "SL201624" "SL201637" "SL201656" "SL201648" ...
##  $ NCIC.CODE   : chr  "2499-13" "5404-0" "5404-0" "5499-44" ...
##  $ DESCRIPTION : chr  "STOLEN VEHICLE" "DUI ALCOHOL" "DUI ALCOHOL" "ALCOHOL IN VEH" ...
##  $ IBR         : chr  "240" "90D" "90D" "90D" ...
##  $ OCCURRED    : chr  "01/01/2016 12:36:15 AM" "01/01/2016 01:00:24 AM" "01/01/2016 01:23:44 AM" "01/01/2016 01:10:27 AM" ...
##  $ REPORTED    : chr  "01/01/2016 12:36:15 AM" "01/01/2016 01:00:24 AM" "01/01/2016 01:23:44 AM" "01/01/2016 01:10:27 AM" ...
##  $ DAY.OF.WEEK : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ LOCATION    : chr  "1400 S EMERY ST                                                                                          " "200 W 500 S                                                                                             " "400 S WEST TEMPLE ST                                                                                    " "" ...
##  $ CITY        : chr  "SALT LAKE CITY" "SALT LAKE CITY" "SALT LAKE CITY" "SALT LAKE CITY" ...
##  $ CITY.COUNCIL: chr  "2" "4" "4" "SL" ...
##  $ POLICE.ZONE : chr  "Z2" "Z3" "Z3" "40" ...
##  $ POLICE.GRID : chr  "123" "134" "135" "40" ...
##  $ X.COORDINATE: int  1882550 1890252 1891039 1869739 1891819 1894816 1893431 1903627 1882890 1895533 ...
##  $ Y_COORDINATE: int  876155 883639 883995 890270 880413 884414 879640 891202 879376 882346 ...

Data Manipulation

After closely analyzing the data in view table, we executed following steps to clean up the data
* Omit NA values
* Remove empty values for variabl LOCATION, CITY.COUNCIL, OCCURED
* Subset the data to include Police Zones Z1 to Z6 only as rest of the values were inaccurate
* Convert time values into relevant format
* Convert categorical variables into factor

crime_data<-na.omit(crime_data)
crime_data <- crime_data[-which(crime_data$LOCATION == ""), ]
crime_data<-crime_data[-which(crime_data$CITY.COUNCIL==""),]
crime_data <- crime_data[-which(crime_data$OCCURRED == ""), ]
crime_data<-subset(crime_data,POLICE.ZONE %in% c("Z1","Z2","Z3","Z4","Z5","Z6"))

crime_data$OCCURRED<- mdy_hms(crime_data$OCCURRED)
crime_data$REPORTED<- mdy_hms(crime_data$REPORTED)
cols<- c("NCIC.CODE","DESCRIPTION","IBR","DAY.OF.WEEK","LOCATION","CITY","CITY.COUNCIL", "POLICE.ZONE",  "POLICE.GRID" )
crime_data[cols]<-lapply(crime_data[cols], factor)

str(crime_data)
## 'data.frame':    43006 obs. of  14 variables:
##  $ CASE        : chr  "SL201624" "SL201637" "SL201656" "SL201660" ...
##  $ NCIC.CODE   : Factor w/ 339 levels "1004-0","1005-0",..: 101 262 262 262 260 262 253 336 36 36 ...
##  $ DESCRIPTION : Factor w/ 58 levels "ALCOHOL IN VEH",..: 48 13 13 13 24 13 44 43 4 4 ...
##  $ IBR         : Factor w/ 48 levels "0","09A","100",..: 19 42 42 42 48 42 1 1 6 6 ...
##  $ OCCURRED    : POSIXct, format: "2016-01-01 00:36:15" "2016-01-01 01:00:24" ...
##  $ REPORTED    : POSIXct, format: "2016-01-01 00:36:15" "2016-01-01 01:00:24" ...
##  $ DAY.OF.WEEK : Factor w/ 7 levels "1","2","3","4",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ LOCATION    : Factor w/ 3998 levels "000 100 S                                                                                               ",..: 1242 2067 2972 3938 2578 3899 2921 805 2863 1218 ...
##  $ CITY        : Factor w/ 3 levels "MURRAY","SALT LAKE CITY",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ CITY.COUNCIL: Factor w/ 9 levels "0","1","2","3",..: 3 5 5 6 5 6 4 3 5 6 ...
##  $ POLICE.ZONE : Factor w/ 6 levels "Z1","Z2","Z3",..: 2 3 3 5 4 5 4 2 4 5 ...
##  $ POLICE.GRID : Factor w/ 23 levels "111","112","113",..: 6 11 12 18 14 18 13 5 16 18 ...
##  $ X.COORDINATE: int  1882550 1890252 1891039 1891819 1894816 1893431 1903627 1882890 1895533 1889454 ...
##  $ Y_COORDINATE: int  876155 883639 883995 880413 884414 879640 891202 879376 882346 876317 ...

The data variables are now in appropriate format. Now let’s skim the data to check for completness

skim(crime_data)
## Skim summary statistics
##  n obs: 43006 
##  n variables: 14 
## 
## -- Variable type:character --------------------------------------------------------------------------------
##  variable missing complete     n min max empty n_unique
##      CASE       0    43006 43006   8  12     0    43003
## 
## -- Variable type:factor -----------------------------------------------------------------------------------
##      variable missing complete     n n_unique
##          CITY       0    43006 43006        3
##  CITY.COUNCIL       0    43006 43006        9
##   DAY.OF.WEEK       0    43006 43006        7
##   DESCRIPTION       0    43006 43006       58
##           IBR       0    43006 43006       48
##      LOCATION       0    43006 43006     3998
##     NCIC.CODE       0    43006 43006      339
##   POLICE.GRID       0    43006 43006       23
##   POLICE.ZONE       0    43006 43006        6
##                                  top_counts ordered
##           SAL: 43004, MUR: 1, SOU: 1, NA: 0   FALSE
##         4: 16810, 5: 7635, 2: 6527, 7: 3774   FALSE
##          6: 6795, 4: 6494, 5: 6439, 3: 6260   FALSE
##  LAR: 6990, PUB: 5846, DRU: 4221, ASS: 3593   FALSE
##   0: 12577, 90Z: 5417, 23C: 3079, 13B: 2636   FALSE
##   200: 1333, 300: 1238, 200: 1109, 300: 562   FALSE
##  490: 3113, 230: 3079, 131: 2230, 549: 1982   FALSE
##  132: 7205, 152: 5171, 162: 2123, 111: 2079   FALSE
##     Z3: 14495, Z5: 7131, Z6: 5884, Z1: 5783   FALSE
## 
## -- Variable type:integer ----------------------------------------------------------------------------------
##      variable missing complete     n       mean      sd      p0       p25
##  X.COORDINATE       0    43006 43006 1889922.83 8703.42 1843802 1886204  
##  Y_COORDINATE       0    43006 43006  882400.59 7072.58  862241  876879.5
##      p50     p75    p100     hist
##  1889516 1894609 1916369 <U+2581><U+2581><U+2581><U+2582><U+2587><U+2587><U+2582><U+2581>
##   884010  886684  913504 <U+2581><U+2583><U+2583><U+2587><U+2583><U+2581><U+2581><U+2581>
## 
## -- Variable type:POSIXct ----------------------------------------------------------------------------------
##  variable missing complete     n        min        max     median n_unique
##  OCCURRED       0    43006 43006 2016-01-01 2017-01-04 2016-07-04    42964
##  REPORTED       0    43006 43006 2016-01-01 2017-01-04 2016-07-04    42964

The data is ready for analysis

Now lets add few more columns for analysis such as weekday when crime occurered, time of occurence etc.

crime_data$occurencemonth<-month(crime_data$OCCURRED)
crime_data$occurencehour<-hour(crime_data$OCCURRED)
crime_data$DAY.OF.WEEK<-weekdays.POSIXt(crime_data$OCCURRED,abbreviate = TRUE)
crime_data$DAY.OF.WEEK<-as.factor(crime_data$DAY.OF.WEEK)
crime_data$occurencemonth<-as.factor(crime_data$occurencemonth)
crime_data$occurencehour<-as.factor(crime_data$occurencehour)

Bin the variables based on Uniform Crime reporting Statistics

The UCR Program collects statistics on the number of offenses known to law enforcement. In the traditional Summary Reporting System (SRS), there are eight crimes, or Part I offenses, (murder and nonnegligent homicide, rape (legacy & revised), robbery, aggravated assault, burglary, motor vehicle theft, larceny-theft, and arson) to be reported to the UCR Program. These offenses were chosen because they are serious crimes, they occur with regularity in all areas of the country, and they are likely to be reported to police. The Part I offenses are defined as: Criminal homicide, Forcible Rape/Legacy Rape, Revised Rape, Robbery,Aggravated assault,Burglary (breaking or entering), Larceny-theft (except motor vehicle theft), Motor vehicle theft, Arson.

Our data has around 58 categories of crime reported. For the effeciency of our algorithm to perform cluster analysis we will form bins for 8 major crimes and bucket the rest as Other crimes.

# Offense Table based on UCR type I crimes
Homicide<-"90"
Rape<-"11"
Robbery<-"12"
Assault<-"13"
Burglary<-"22"
Larceny<-"23"
Auto.Theft<-"24"
Arson<-"20"
Drugs<-"35"

crime_data$OffenseType = case_when(substr(crime_data$NCIC.CODE,1,2) %in% Homicide ~ "Homicide",
                               substr(crime_data$NCIC.CODE,1,2) %in% Rape ~ "Rape",
                               substr(crime_data$NCIC.CODE,1,2) %in% Robbery ~ "Robbery",
                               substr(crime_data$NCIC.CODE,1,2) %in% Assault ~ "Assault",
                               substr(crime_data$NCIC.CODE,1,2) %in% Burglary ~ "Burglary",
                               substr(crime_data$NCIC.CODE,1,2) %in% Larceny ~ "Larceny",
                               substr(crime_data$NCIC.CODE,1,2) %in% Auto.Theft ~ "Auto.Theft",
                               substr(crime_data$NCIC.CODE,1,2) %in% Arson ~ "Arson",
                               substr(crime_data$NCIC.CODE,1,2) %in% Drugs ~ "Drugs",
                               TRUE ~ "Others"
)
crime_data$OffenseType<-as.factor(crime_data$OffenseType)

Exploratory Data Analysis

Major crimes in 2016

group_by_crimetype<- group_by(crime_data,DESCRIPTION)

count_by_crimetype<- summarise(group_by_crimetype,n=n())
ggplot(data=subset(count_by_crimetype,n>200), aes(x=reorder(DESCRIPTION,-n),y=n))+
  geom_bar(stat="identity",fill="steelblue")+
  geom_text(aes(label=n),vjust=0.3,size=2.5)+
  coord_flip()+
  xlab("Crime Description") +
  ylab("Count of Occurences in 2016")

#ggplotly(plot)%>% layout( autosize=F)%>% config(displayModeBar = F)

The graph shows 21 out of 58 categories of crime. The visualization contains crimes whose occurence is more than 200 for year 2016. We see that Larceny, Assault and Drugs are the major Part 1 offenses reported in year 2016.

Crimes reported by police Zone

SLC has 6 police zones divided into sub police grids. We analyze what crimes are reported majorly in each Police Zone.

group_by_policezone<- group_by(subset(crime_data,OffenseType!="Others"),POLICE.ZONE,OffenseType)
count_by_policezone<- summarise(group_by_policezone,n=n())

df_sorted <- arrange(count_by_policezone, POLICE.ZONE, OffenseType) 

p2<-ggplot(data=subset(df_sorted,n>50), aes(x=reorder(POLICE.ZONE,n),y=n,fill=OffenseType))+
  geom_bar(stat="identity")+
  coord_flip()+
  xlab("Police Zone") +
  ylab("Count of Occurences in 2016")+
  theme_minimal()
ggplotly(p2)%>% config(displayModeBar = F)

Total crimes by offense Type reported in 2016

p1<-ggplot(crime_data,aes(x=OffenseType, fill=DESCRIPTION))+
  geom_histogram(stat="count")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Ignoring unknown parameters: binwidth, bins, pad
ggplotly(p1)%>% config(displayModeBar = F)

#KMean Clustering
While we used EDA to find crime rates of each type of crime, we will use clustering to group together clusters with high crime rate.

library(NbClust)

#Convert data into numerical data
library(reshape2)
Crime_Clust_Location<-dcast(subset(crime_data,OffenseType!="Others"),formula = POLICE.GRID~OffenseType,fun.aggregate = length, value.var = "OffenseType")
rownames(Crime_Clust_Location) <- Crime_Clust_Location[,1]
Crime_Clust_Location[,1] <- NULL

Crime_Clust<-scale(Crime_Clust_Location)
head(Crime_Clust)
##          Arson     Assault Auto.Theft    Burglary       Drugs   Homicide
## 111  1.4649377  1.22286642  2.2096010  0.97368999 -0.21348827  1.5095603
## 112  1.4649377  0.77059019  0.4942174 -0.33086553 -0.08685172 -0.4193223
## 113 -1.4230823 -0.05858288  0.4528828 -0.79677822 -0.27884907 -0.4193223
## 121 -0.4604090  0.76116777  1.6102501  0.01080377 -0.21348827 -0.4193223
## 122  0.5022643 -0.45432457  0.1222065 -0.57935230 -0.27067897 -0.4193223
## 123 -1.4230823 -0.29414341 -0.2291371 -1.10738667 -0.30948695 -0.4193223
##        Larceny     Robbery
## 111 -0.1783812  0.04134139
## 112  0.2252782  1.36196920
## 113 -0.5158669 -0.59255995
## 121 -0.3769022 -0.27560928
## 122 -0.5787319 -0.53973484
## 123 -0.2842591 -0.43408462

The data is reshaped to include only the major crime categories and the row names are renames as police grid numbers. The data is then scaled for all major offenses.

Choosing optimal number of clusters

Elbow Method

Based on total within-cluster variation where the algorithm attempts to minimize total within-cluster sum of square

library(factoextra)

fviz_nbclust(Crime_Clust,kmeans,method="wss")+
  labs(subtitle = "Elbow Method")

Silhouette Method

This algorithm measures how well each object lies within its cluster. A high average silhouette width indicates a good clustering.

fviz_nbclust(Crime_Clust,kmeans,method="silhouette")+
  labs(subtitle = "Silhouette Method")

Gap Static Method

Here, we choose the smallest value of k such that the gap statistic is within one standard deviation of the gap at k+1.

fviz_nbclust(Crime_Clust,kmeans,method="gap_stat",nboot = 50)+
  labs(subtitle = "Gap statistic method")

The last two methods clearly identifies 2 as optimal number of clusters.
###Generate Cluster Center

clusters<-kmeans(Crime_Clust,2,nstart=25)
clusters$centers
##        Arson    Assault Auto.Theft    Burglary      Drugs   Homicide
## 1 -0.0478347 -0.2367115 -0.1090701 -0.06462971 -0.2304121 -0.1437677
## 2  0.5022643  2.4854709  1.1452365  0.67861196  2.4193265  1.5095603
##      Larceny    Robbery
## 1 -0.2231272 -0.2655474
## 2  2.3428358  2.7882472
ClusterResult<-data.frame(clusters$centers)

Summarize Cluster Results into meaningful information

summarizeClusterResult<-function(colno){
  case_when(
    ClusterResult[,colno]<quantile(ClusterResult[,colno],0.25)~"Low",
  ClusterResult[,colno]>=quantile(ClusterResult[,colno],0.25)& ClusterResult[,colno]<quantile(ClusterResult[,colno],0.75)~"Medium",
  ClusterResult[,colno]>=quantile(ClusterResult[,colno],0.75)~"High"
  )
}
for (i in 1:8 ){
  ClusterResult[,i]<-summarizeClusterResult(i)
}
library(formattable)
 ClusterResult<-formattable(ClusterResult,list(
   Arson= formatter("span",style=x~ifelse(
   x=="High","color:red",ifelse(
   x=="Medium","color:orange","color:green"
   ))),
   Assault= formatter("span",style=x~ifelse(
   x=="High","color:red",ifelse(
   x=="Medium","color:orange","color:green"
   ))),
   Burglary= formatter("span",style=x~ifelse(
   x=="High","color:red",ifelse(
   x=="Medium","color:orange","color:green"
   ))),
   Robbery= formatter("span",style=x~ifelse(
   x=="High","color:red",ifelse(
   x=="Medium","color:orange","color:green"
   ))),
   Drugs= formatter("span",style=x~ifelse(
   x=="High","color:red",ifelse(
   x=="Medium","color:orange","color:green"
   ))),
   Auto.Theft= formatter("span",style=x~ifelse(
   x=="High","color:red",ifelse(
   x=="Medium","color:orange","color:green"
   ))),
   Homicide= formatter("span",style=x~ifelse(
   x=="High","color:red",ifelse(
   x=="Medium","color:orange","color:green"
   ))),
   Larceny= formatter("span",style=x~ifelse(
   x=="High","color:red",ifelse(
   x=="Medium","color:orange","color:green"
   )))))
ClusterResult
Arson Assault Auto.Theft Burglary Drugs Homicide Larceny Robbery
Low Low Low Low Low Low Low Low
High High High High High High High High

The above results shows the high influence of what crimes in which cluster. We will now update the cluster information in our previous Cluster Table with crimes and its freqency in various police grids.

Crime_Clust_Location$Cluster <- (clusters$cluster)

Visualizing Clusters

The graph below shows what grids fall into which cluster

str(clusters)
## List of 9
##  $ cluster     : Named int [1:23] 1 1 1 1 1 1 1 1 2 1 ...
##   ..- attr(*, "names")= chr [1:23] "111" "112" "113" "121" ...
##  $ centers     : num [1:2, 1:8] -0.0478 0.5023 -0.2367 2.4855 -0.1091 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "1" "2"
##   .. ..$ : chr [1:8] "Arson" "Assault" "Auto.Theft" "Burglary" ...
##  $ totss       : num 176
##  $ withinss    : num [1:2] 86.3 24.8
##  $ tot.withinss: num 111
##  $ betweenss   : num 64.8
##  $ size        : int [1:2] 21 2
##  $ iter        : int 1
##  $ ifault      : int 0
##  - attr(*, "class")= chr "kmeans"
fviz_cluster(clusters, data = Crime_Clust_Location,
             repel = TRUE)

Now lets assign our cluster to parent Crimedata table. But first lets convert state plane coordinates into lat and long

library(rgdal)
nad83_coords <- data.frame(lon=crime_data$X.COORDINATE,lat=crime_data$Y_COORDINATE)
coordinates(nad83_coords) <- c("lon", "lat")
proj4string(nad83_coords)=CRS("+init=esri:102285")
x<-data.frame(spTransform(nad83_coords,CRS("+init=epsg:4326")))
crime_data$lon<-x$lon
crime_data$lat<-x$lat


Crime_Clust_Location$POLICE.GRID<-rownames(Crime_Clust_Location)
Crime_Clust_Location$POLICE.GRID<-as.factor(Crime_Clust_Location$POLICE.GRID)
crime_data$cluster<-Crime_Clust_Location$Cluster[match(crime_data$POLICE.GRID,Crime_Clust_Location$POLICE.GRID)]
crime_data$cluster <- paste("Cluster",crime_data$cluster, sep=" ")
crime_data$cluster<-as.factor(crime_data$cluster)
names(crime_data)
##  [1] "CASE"           "NCIC.CODE"      "DESCRIPTION"    "IBR"           
##  [5] "OCCURRED"       "REPORTED"       "DAY.OF.WEEK"    "LOCATION"      
##  [9] "CITY"           "CITY.COUNCIL"   "POLICE.ZONE"    "POLICE.GRID"   
## [13] "X.COORDINATE"   "Y_COORDINATE"   "occurencemonth" "occurencehour" 
## [17] "OffenseType"    "lon"            "lat"            "cluster"

Analysis of Clusters

Clusters grouped by offensetype

group_by_cluster<- group_by(subset(crime_data,OffenseType!="Others"),cluster,OffenseType)
count_by_cluster<- summarise(group_by_cluster,n=n())

df_sorted <- arrange(count_by_cluster, cluster, OffenseType) 

p4<-ggplot(data=subset(df_sorted), aes(x=reorder(cluster,n),y=n,fill=OffenseType))+
  geom_bar(stat="identity")+
  coord_flip()+
  xlab("Crime Cluster") +
  ylab("Count of Occurences in 2016")+
  theme_minimal()
ggplotly(p4)%>% config(displayModeBar = F)

Crime by Weekday in Cluster Area

ggplot(subset(crime_data,OffenseType!="Others"),aes(x=DAY.OF.WEEK,fill=OffenseType))+
  geom_histogram(stat="count")+
  facet_wrap(~cluster,ncol = 1)
## Warning: Ignoring unknown parameters: binwidth, bins, pad

Logit Regression

We chose to use logit ahead of SVC as we wanted to find areas with high probability of crime for a specific hour and day of the week. However, upon running logistic regression with 70-30 split, the top 5 police grids for a specific crime remained the same irrespective of hour of the day or day of the week. It is possible that further predictor variables are required in order to come up with an effective prediction model. Hence, we restricted our analysis to clustering.’